Multi-objective discounted dynamic programming The Neighbour Search approach to construct Pareto sets of multi-objective Markov Decision Processes

نویسندگان

  • Gianluca Dorini
  • Dragan Savić
چکیده

The Neighbour Search (NS) algorithm, is an iterative method for constructing Pareto sets of multi-dimensional polytopes. A NS iteration consists in two steps: Edges Exploration and Neighbour Detection. Edges Exploration takes a Pareto vertex and determines all Pareto edges connecting such a Pareto vertex to its neighbours. Each neighbour is again a Pareto vertex that is obtained by Neighbour Detection. The procedure continues until all Pareto vertices are explored. The purpose of this paper is to describe in detail the application of NS to Markov Decision Processes (MDPs) with N discounted objectives. Novel numeric techniques are herein developed to effectively adapt Edges Exploration and Neighbour Detection to the MDPs characteristics. Edges Exploration consists of solving a problem of redundancy removal for systems of linear inequalities in N dimensions; the number of inequalities is equivalent to the size of the MDP. Neighbour Detection is performed either by Direct Neighbour Search (DNS) or by Cross Neighbour Search (CNS). The former requires the Bellman equation to be solved, even though with a reduced action set. The latter does not require the Bellman equation to be solved, and is computationally linear in the size of the MDP, and thus more efficient than DNS. However, CNS requires conditions that are not always fulfilled, whereas DNS is always applicable. Experimental results suggest that conditions for CNS to be applicable are actually satisfied for the most of NS iterations. In Gianluca Dorini Department of Environmental Engineering, Technical University of Denmark, Miljvej 113, DK-2800 Kongens Lyngby, Denmark E-mail: [email protected] Dragan Savić Centre for Water Systems, School of Engineering, Computing and Mathematics, University of Exeter, Harrison Building, North Park Road,Exeter EX4 4QF UK, E-mail: [email protected]

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tabu Search Method for a New Bi-Objective Open Shop Scheduling Problem by a Fuzzy Multi-Objective Decision Making Approach (RESEARCH NOTE)

This paper proposes a novel, bi-objective mixed-integer mathematical programming for an open shop scheduling problem (OSSP) that minimizes the mean tardiness and the mean completion time. To obtain the efficient (Pareto-optimal) solutions, a fuzzy multi-objective decision making (fuzzy MODM) approach is applied. By the use of this approach, the related auxiliary single objective formulation can...

متن کامل

DYNAMIC PERFORMANCE OPTIMIZATION OF TRUSS STRUCTURES BASED ON AN IMPROVED MULTI-OBJECTIVE GROUP SEARCH OPTIMIZER

This paper presents an improved multi-objective group search optimizer (IMGSO) that is based on Pareto theory that is designed to handle multi-objective optimization problems. The optimizer includes improvements in three areas: the transition-feasible region is used to address constraints, the Dealer’s Principle is used to construct the non-dominated set, and the producer is updated using a tab...

متن کامل

Model and Solution Approach for Multi objective-multi commodity Capacitated Arc Routing Problem with Fuzzy Demand

The capacitated arc routing problem (CARP) is one of the most important routing problems with many applications in real world situations. In some real applications such as urban waste collection and etc., decision makers have to consider more than one objective and investigate the problem under uncertain situations where required edges have demand for more than one type of commodity. So, in thi...

متن کامل

The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach

Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...

متن کامل

A Multi Objective Fibonacci Search Based Algorithm for Resource Allocation in PERT Networks

The problem we investigate deals with the optimal assignment of resources to the activities of a stochastic project network. We seek to minimize the expected cost of the project include sum of resource utilization costs and lateness costs. We assume that the work content required by the activities follows an exponential distribution. The decision variables of the model are the allocated resourc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009